Parking Violations in NYC

Data

For this assignment, we are going to investigate data on parking violations in NYC.

Parking violations in 2020/21

NYC Open Data has data on all parking violations issued in NYC since 2014. The updated dataset provided for 2021 currently includes about 10 million observations. To make the assignment manageable, I have reduced it to a subset of tickets issued in from Jan 2020 to Jan 2021 and by Manhattan precincts only, yielding about 2.2M tickets.

Two support files are also included in the parking sub folder:

  • the descriptions of all variables
  • the dictionary of violation codes

Police Precincts

A second data source is the shape files of police precincts in NYC.

Exercise

1. Data exploration

Before focusing on the spatial part of the data, let’s explore the basic patterns in the data.

a) Violation Code and Fine Amounts

Add the violation code descriptions and fine amounts to the data file. Provide a visual overview of the top 10 most common types of violations (feel free to group them into categories if reasonable). Compare how this ranking differs if we focus on the total amount of revenue generated.

parking_violations <- read_csv("data/parking/parkingNYC_Jan2020-Jan2021.csv")
violation_code_fine <- read_excel("data/parking/ParkingViolationCodes_January2020.xlsx", sheet = 1)
# Check violation codes
sort(unique(parking_violations$`Violation Code`))
##  [1]  0  1  2  3  4  5  6  8  9 10 11 13 14 16 17 18 19 20 21 22 23 24 25 26 27
## [26] 28 29 30 31 33 35 37 38 39 40 41 42 44 45 46 47 48 49 50 51 52 53 54 55 56
## [51] 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82
## [76] 83 84 85 86 87 89 91 95 96 97 98 99
sort(unique(violation_code_fine$`VIOLATION CODE`))
##  [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
## [26] 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50
## [51] 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75
## [76] 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 96 97 98 99
length(sort(unique(parking_violations$`Violation Code`)))
## [1] 87
length(sort(unique(violation_code_fine$`VIOLATION CODE`)))
## [1] 97
# Looks like we have more violation code in Excel files
mydata <- left_join(parking_violations, violation_code_fine, by = c("Violation Code" = "VIOLATION CODE"))
head(mydata)
## Warning in fansi::strwrap_ctl(x, width = max(width, 0), indent = indent, :
## Encountered a C0 control character, see `?unhandled_ctl`; you can use
## `warn=FALSE` to turn off these warnings.
## # A tibble: 6 x 50
##   `Summons Number` `Plate ID` `Registration State` `Plate Type` `Issue Date`
##              <dbl> <chr>      <chr>                <chr>        <chr>       
## 1       1474094223 KDT3875    NY                   PAS          06/24/2020  
## 2       1474094600 GTW5034    NY                   PAS          06/26/2020  
## 3       1474097807 JEF4856    NY                   PAS          06/26/2020  
## 4       1474097832 HKR3624    NY                   PAS          06/19/2020  
## 5       1474100089 HJZ0180    PA                   PAS          06/22/2020  
## 6       1474100090 HJZ0180    PA                   PAS          06/22/2020  
## # … with 45 more variables: Violation Code <dbl>, Vehicle Body Type <chr>,
## #   Vehicle Make <chr>, Issuing Agency <chr>, Street Code1 <dbl>,
## #   Street Code2 <dbl>, Street Code3 <dbl>, Vehicle Expiration Date <dbl>,
## #   Violation Location <chr>, Violation Precinct <dbl>, Issuer Precinct <dbl>,
## #   Issuer Code <dbl>, Issuer Command <chr>, Issuer Squad <chr>,
## #   Violation Time <chr>, Time First Observed <chr>, Violation County <chr>,
## #   Violation In Front Of Or Opposite <chr>, House Number <chr>,
## #   Street Name <chr>, Intersecting Street <chr>, Date First Observed <dbl>,
## #   Law Section <dbl>, Sub Division <chr>, Violation Legal Code <lgl>,
## #   Days Parking In Effect <chr>, From Hours In Effect <chr>,
## #   To Hours In Effect <chr>, Vehicle Color <chr>, Unregistered Vehicle? <dbl>,
## #   Vehicle Year <dbl>, Meter Number <chr>, Feet From Curb <dbl>,
## #   Violation Post Code <lgl>, Violation Description <lgl>,
## #   No Standing or Stopping Violation <lgl>, Hydrant Violation <lgl>,
## #   Double Parking Violation <lgl>, issue_date <date>, year <dbl>, month <dbl>,
## #   day <dbl>, VIOLATION DESCRIPTION <chr>, Manhattan  96th St. & below
## #   (Fine Amount $) <dbl>, All Other Areas
 (Fine Amount $) <dbl>
summary1 <- mydata %>%
  group_by(`VIOLATION DESCRIPTION`) %>%
  tally() %>%
  ungroup() %>%
  arrange(desc(n)) %>%
  top_n(10, wt = n)
summary1 %>%
  ggplot(aes(x = reorder(`VIOLATION DESCRIPTION`, n), y = n, fill = `VIOLATION DESCRIPTION`))+
  geom_bar(stat = "identity")+
  scale_y_continuous("Frequency", labels = scales::comma)+
  coord_flip()+
  guides(fill = FALSE)+
  labs(title = "Top 10 Parking Violations in NYC 2020", 
       caption = "Source: NYC Open Data")+
  theme_fivethirtyeight()+
  theme(plot.title = element_text(hjust = 0, size = 11))

To better calculate the total amount of revenue generated. It is necessary to dig into where those tickets are issued. There is an variable Violation Precinct, which indicates the precinct that violation happens. Due to the fact that the fine amount differs depending on whether it happens at Manhattan 96th St. & below or All Other Areas. I am going to identify the fine amount based on the Violation Precinct. You can check this article Smoking Marijuana in Public: The Spatial and Policy Shift in New York City Arrests, 1992-2003.

below_96 <- c(1, 5, 7, 6, 9, 10, 13, 14, 17, 18, 19, 20, 22, 23, 24)
colnames(mydata)[49] <- "manhattan_96_below"
colnames(mydata)[50] <- "all_other_areas"
colnames(mydata)[15] <- "violation_precinct"
colnames(mydata)[16] <- "issuer_precinct"
# Decide fine amount based on violation precinct
mydata$fine <- ifelse(mydata$violation_precinct %in% below_96, 
                     mydata$manhattan_96_below, 
                     mydata$all_other_areas)
summary2 <- mydata %>%
  filter(!is.na(`VIOLATION DESCRIPTION`)) %>%
  group_by(`VIOLATION DESCRIPTION`) %>%
  summarise(total_revenue = sum(fine)) %>%
  ungroup() %>%
  arrange(desc(total_revenue)) %>%
  top_n(10, wt = total_revenue)
summary2 %>%
  ggplot(aes(x = reorder(`VIOLATION DESCRIPTION`, total_revenue), 
             y = total_revenue, 
             fill = `VIOLATION DESCRIPTION`))+
  geom_bar(stat = "identity")+
  labs(title = "TOP 10 Most-Paying Parking Violations in NYC in 2020", 
       caption = "Source: NYC Open Data")+
  scale_y_continuous(labels = scales::comma)+
  coord_flip()+
  theme_fivethirtyeight()+
  guides(fill = FALSE)+
  theme(plot.title = element_text(size = 11, hjust = 0))

# Remove those with no violation description
mydata <- mydata %>%
  filter(!is.na(`VIOLATION DESCRIPTION`))
b) Average amount of fine by vehicle

Compare the average amount of fine by vehicle color, vehicle year, and vehicle plate type [Hint: it is sufficient to restrict your attention to commercial (COM) and passenger (PAS) vehicles]? Briefly describe your findings.

Let’s first deal with vehicle colors.

# Check Unique Vehicle Colors in our dataset
sort(unique(mydata$`Vehicle Color`))
##   [1] ".  Y"  ".-"    "B"     "B ACK" "B L"   "B LAC" "BAIGE" "BALCK" "BBRW" 
##  [10] "BCK"   "BCL"   "BEGE"  "BEIG"  "BEIGE" "BEING" "BERG"  "BERGU" "BG"   
##  [19] "BGE"   "BI"    "BIEGE" "BIERG" "BK"    "BK/"   "BKG"   "BKJ"   "BKT"  
##  [28] "BKW"   "BL"    "BL/"   "BLA"   "BLAC"  "BLACK" "BLAK"  "BLB"   "BLC"  
##  [37] "BLCK"  "BLE"   "BLG"   "BLK"   "BLO"   "BLU"   "BLUE"  "BLW"   "BLY"  
##  [46] "BN"    "BR"    "BRCH"  "BRG"   "BRIGE" "BRK"   "BRN"   "BRNW"  "BRO"  
##  [55] "BRON"  "BRONZ" "BROW"  "BROWN" "BRW"   "BRWN"  "BRWON" "BU"    "BUG"  
##  [64] "BUL"   "BUR"   "BURG"  "BURGE" "BURGU" "BURUN" "BW"    "BWN"   "BY"   
##  [73] "CREAM" "DARKG" "DARKR" "DEEP"  "DK/"   "DKB"   "DKG"   "DKGN"  "DKGRA"
##  [82] "DKGRY" "DKGY"  "DKM"   "DKP"   "DKR"   "DRKGR" "FAN"   "FR"    "G"    
##  [91] "GARY"  "GAY"   "GD"    "GERY"  "GEY"   "GL"    "GLD"   "GLO"   "GN"   
## [100] "GOLD"  "GR"    "GR/"   "GRA"   "GRAAY" "GRANG" "GRAY"  "GRE"   "GRE Y"
## [109] "GREE"  "GREEN" "GREG"  "GREN"  "GREU"  "GREY"  "GRFAY" "GRG"   "GRN"  
## [118] "GRRY"  "GRT"   "GRU"   "GRW"   "GRY"   "GV"    "GW"    "GY"    "GY/"  
## [127] "GYB"   "GYG"   "GYT"   "HITE"  "HRY"   "INFIN" "LAVEN" "LIGHT" "LT/"  
## [136] "LTB"   "LTBLU" "LTG"   "LTP"   "LTT"   "M"     "MAR"   "MARO"  "MAROO"
## [145] "MC"    "METAL" "MN"    "MR"    "MRG"   "MULTI" "MUNI"  "NAVY"  "NO"   
## [154] "NOC"   "NVBLU" "NVYBL" "NYPD"  "OLIVE" "OPTER" "OR"    "ORA"   "ORANG"
## [163] "ORG"   "ORGE"  "ORN"   "OTHER" "PEARL" "PINK"  "PK"    "PKGRY" "PR"   
## [172] "PRPL"  "PURP"  "PURPL" "QBLK"  "R"     "RD"    "RD/"   "RDG"   "RDT"  
## [181] "RDW"   "RE"    "RED"   "REG"   "RUST"  "S"     "SIL"   "SILER" "SILV" 
## [190] "SILVE" "SILVR" "SIV"   "SIVER" "SIVR"  "SL"    "SLATE" "SLIVE" "SLR"  
## [199] "SLV"   "SLVER" "SLVR"  "SR"    "SUBUR" "SV"    "SY"    "TAN"   "TEAL" 
## [208] "TN"    "TN/"   "TNG"   "TX"    "U"     "W"     "WH"    "WH/"   "WH/BL"
## [217] "WHB"   "WHE"   "WHG"   "WHI"   "WHIE"  "WHIT"  "WHITE" "WHITW" "WHO"  
## [226] "WHT"   "WHTE"  "WHTG"  "WHTIE" "WI"    "WN"    "WT"    "WTE"   "WTH"  
## [235] "WWT"   "Y"     "YE"    "YEL"   "YELL"  "YELLO" "YL"    "YLL"   "YLLW" 
## [244] "YLW"   "YN"    "YU"    "YW"
# First remove replace those with no clear color code as NA
mydata$`Vehicle Color` <- ifelse(mydata$`Vehicle Color` == ".-", 
                                 NA, 
                                 mydata$`Vehicle Color`)
# Remove Special Characters
mydata$`Vehicle Color` <- str_replace_all(string = mydata$`Vehicle Color`, 
                                          pattern = char_class("./ "), 
                                          replacement = "")
sort(unique(mydata$`Vehicle Color`))
##   [1] "B"     "BACK"  "BAIGE" "BALCK" "BBRW"  "BCK"   "BCL"   "BEGE"  "BEIG" 
##  [10] "BEIGE" "BEING" "BERG"  "BERGU" "BG"    "BGE"   "BI"    "BIEGE" "BIERG"
##  [19] "BK"    "BKG"   "BKJ"   "BKT"   "BKW"   "BL"    "BLA"   "BLAC"  "BLACK"
##  [28] "BLAK"  "BLB"   "BLC"   "BLCK"  "BLE"   "BLG"   "BLK"   "BLO"   "BLU"  
##  [37] "BLUE"  "BLW"   "BLY"   "BN"    "BR"    "BRCH"  "BRG"   "BRIGE" "BRK"  
##  [46] "BRN"   "BRNW"  "BRO"   "BRON"  "BRONZ" "BROW"  "BROWN" "BRW"   "BRWN" 
##  [55] "BRWON" "BU"    "BUG"   "BUL"   "BUR"   "BURG"  "BURGE" "BURGU" "BURUN"
##  [64] "BW"    "BWN"   "BY"    "CREAM" "DARKG" "DARKR" "DEEP"  "DK"    "DKB"  
##  [73] "DKG"   "DKGN"  "DKGRA" "DKGRY" "DKGY"  "DKM"   "DKP"   "DKR"   "DRKGR"
##  [82] "FAN"   "FR"    "G"     "GARY"  "GAY"   "GD"    "GERY"  "GEY"   "GL"   
##  [91] "GLD"   "GLO"   "GN"    "GOLD"  "GR"    "GRA"   "GRAAY" "GRANG" "GRAY" 
## [100] "GRE"   "GREE"  "GREEN" "GREG"  "GREN"  "GREU"  "GREY"  "GRFAY" "GRG"  
## [109] "GRN"   "GRRY"  "GRT"   "GRU"   "GRW"   "GRY"   "GV"    "GW"    "GY"   
## [118] "GYB"   "GYG"   "GYT"   "HITE"  "HRY"   "INFIN" "LAVEN" "LIGHT" "LT"   
## [127] "LTB"   "LTBLU" "LTG"   "LTP"   "LTT"   "M"     "MAR"   "MARO"  "MAROO"
## [136] "MC"    "METAL" "MN"    "MR"    "MRG"   "MULTI" "MUNI"  "NAVY"  "NO"   
## [145] "NOC"   "NVBLU" "NVYBL" "NYPD"  "OLIVE" "OPTER" "OR"    "ORA"   "ORANG"
## [154] "ORG"   "ORGE"  "ORN"   "OTHER" "PEARL" "PINK"  "PK"    "PKGRY" "PR"   
## [163] "PRPL"  "PURP"  "PURPL" "QBLK"  "R"     "RD"    "RDG"   "RDT"   "RDW"  
## [172] "RE"    "RED"   "REG"   "RUST"  "S"     "SIL"   "SILER" "SILV"  "SILVE"
## [181] "SILVR" "SIV"   "SIVER" "SIVR"  "SL"    "SLATE" "SLIVE" "SLR"   "SLV"  
## [190] "SLVER" "SLVR"  "SR"    "SUBUR" "SV"    "SY"    "TAN"   "TEAL"  "TN"   
## [199] "TNG"   "TX"    "U"     "W"     "WH"    "WHB"   "WHBL"  "WHE"   "WHG"  
## [208] "WHI"   "WHIE"  "WHIT"  "WHITE" "WHITW" "WHO"   "WHT"   "WHTE"  "WHTG" 
## [217] "WHTIE" "WI"    "WN"    "WT"    "WTE"   "WTH"   "WWT"   "Y"     "YE"   
## [226] "YEL"   "YELL"  "YELLO" "YL"    "YLL"   "YLLW"  "YLW"   "YN"    "YU"   
## [235] "YW"
# Create color vectors
black <- c("B", "BACK", "BALCK", "BCK", "BK", "BLA", "BLAC", "BLACK", "BLAK", "BLC", 
           "BLCK", "BLK", "DK")
blue <- c("BL", "BU", "BLU", "BLUE", "BUL", "LTB", "LTBLU", "NVBLU", "NVYBL", "NAVY", "TEAL", 
          "WHB", "WHBL", "DKB")
brown <- c("BN", "BR", "BRN", "BBRW", "BRNW", "BRO", "BRON", "BRONZ", "BROW", "BROWN", 
           "BRW", "BRWN", "BRWON", "BW", "BWN", "TAN", "TN", "TNG")
white <- c("W", "WH", "WHE", "WHI", "WHIE", "WHIT", "WHITE","WHITW", 
           "WHO", "WHT", "WHTE", "WHTIE", "WI", 
           "WT", "WTE", "WTH", "WWT", "PEARL")
yellow <- c("Y", "YE", "YEL", "YELL", "YELLO", "YL", "YLL", "YLLW", "YLW", "YW", 
            "GL", "GLD", "GLO", "GOLD", "GD")
green <- c("GRE", "GREE", "GREEN", "GREG", "GREN", "GREU", "WHG", "GN", "GRN", 
           "DKGN", "DKG", "DRKGR", "DARKG","LTG", "OLIVE", "BRG")
orange <- c("OR", "ORA", "ORANG", "ORG", "ORGE", "ORN")
red <- c("R", "RD", "RE", "RED", "RDG", "BUR", "BUG", "BURG", 
         "BURGE", "BURGU", "BURUN", "DARKR", "DKR",
         "MAR", "MARO", "MAROO", "MR")
grey <- c("G", "GARY", "GAY", "GERY", "GEY", "GR", "GRA", "GRAAY", "GRANG", "GRAY", 
          "GREY", "GRFAY", "GRRY", "GRY", "GY", "DKGRY", "DKGY")
sliver <- c("S", "SIL", "SILER", "SILV", "SILVE", 
            "SILVR", "SIV", "SIVER", "SIVR", "SL", "SLATE", "SLIVE", "SLR", "SLV", 
            "SLVER", "SLVR", "SR", "SV")
purple <- c("PR", "PRPL", "PURP", "PURPL", "LAVEN", "PURPLE")
pink <- c("PINK", "PK", "PKGRY", "DKP", "LTP")
beige <- c("BAIGE", "BEGE", "BEIG", "BEIGE", "BEING", "BERG", "BERGU", "BG", "BGE", 
           "BI", "BIEGE", "BIERG", "BRIGE")
# Replace color
mydata$`Vehicle Color` <- str_replace(string = mydata$`Vehicle Color`, 
                                      pattern = START %R% or1(black) %R% END, 
                                      replacement = "black")
mydata$`Vehicle Color` <- str_replace(string = mydata$`Vehicle Color`, 
                                      pattern = START %R% or1(blue) %R% END, 
                                      replacement = "blue")
mydata$`Vehicle Color` <- str_replace(string = mydata$`Vehicle Color`, 
                                      pattern = START %R% or1(brown) %R% END, 
                                      replacement = "brown")
mydata$`Vehicle Color` <- str_replace(string = mydata$`Vehicle Color`, 
                                      pattern = START %R% or1(white) %R% END, 
                                      replacement = "white")
mydata$`Vehicle Color` <- str_replace(string = mydata$`Vehicle Color`, 
                                      pattern = START %R% or1(yellow) %R% END, 
                                      replacement = "yellow")
mydata$`Vehicle Color` <- str_replace(string = mydata$`Vehicle Color`, 
                                      pattern = START %R% or1(green) %R% END, 
                                      replacement = "green")
mydata$`Vehicle Color` <- str_replace(string = mydata$`Vehicle Color`, 
                                      pattern = START %R% or1(orange) %R% END, 
                                      replacement = "orange")
mydata$`Vehicle Color` <- str_replace(string = mydata$`Vehicle Color`, 
                                      pattern = START %R% or1(red) %R% END, 
                                      replacement = "red")
mydata$`Vehicle Color` <- str_replace(string = mydata$`Vehicle Color`, 
                                      pattern = START %R% or1(grey) %R% END, 
                                      replacement = "grey")
mydata$`Vehicle Color` <- str_replace(string = mydata$`Vehicle Color`, 
                                      pattern = START %R% or1(sliver) %R% END, 
                                      replacement = "sliver")
mydata$`Vehicle Color` <- str_replace(string = mydata$`Vehicle Color`, 
                                      pattern = START %R% or1(purple) %R% END, 
                                      replacement = "purple")
mydata$`Vehicle Color` <- str_replace(string = mydata$`Vehicle Color`, 
                                      pattern = START %R% or1(pink) %R% END, 
                                      replacement = "pink")
mydata$`Vehicle Color` <- str_replace(string = mydata$`Vehicle Color`, 
                                      pattern = START %R% or1(beige) %R% END, 
                                      replacement = "beige")
sort(unique(mydata$`Vehicle Color`))
##  [1] "BCL"    "beige"  "BKG"    "BKJ"    "BKT"    "BKW"    "black"  "BLB"   
##  [9] "BLE"    "BLG"    "BLO"    "blue"   "BLW"    "BLY"    "BRCH"   "BRK"   
## [17] "brown"  "BY"     "CREAM"  "DEEP"   "DKGRA"  "DKM"    "FAN"    "FR"    
## [25] "green"  "grey"   "GRG"    "GRT"    "GRU"    "GRW"    "GV"     "GW"    
## [33] "GYB"    "GYG"    "GYT"    "HITE"   "HRY"    "INFIN"  "LIGHT"  "LT"    
## [41] "LTT"    "M"      "MC"     "METAL"  "MN"     "MRG"    "MULTI"  "MUNI"  
## [49] "NO"     "NOC"    "NYPD"   "OPTER"  "orange" "OTHER"  "pink"   "purple"
## [57] "QBLK"   "RDT"    "RDW"    "red"    "REG"    "RUST"   "sliver" "SUBUR" 
## [65] "SY"     "TX"     "U"      "white"  "WHTG"   "WN"     "yellow" "YN"    
## [73] "YU"
selected_colors <- c("black", "blue", "brown", "white", "yellow", "green", 
            "orange", "red", "grey", "sliver", "purple", "pink", "beige")
mydata2 <- mydata %>%
  filter(`Vehicle Color` %in% selected_colors)
mydata2 %>%
  group_by(`Vehicle Color`) %>%
  summarise(average_fine = mean(fine)) %>%
  ungroup() %>%
  arrange(desc(average_fine)) %>%
  mutate(`Vehicle Color` = str_to_title(`Vehicle Color`)) %>%
  ggplot(aes(x = reorder(`Vehicle Color`, -average_fine), y = average_fine, 
             fill = `Vehicle Color`))+
  geom_bar(stat = "identity")+
  scale_fill_manual(values = c("Brown" = "#763626", "Green"="#3F681C", 
                               "Pink"="pink", "Yellow"="#F9DC24", "White"="white", 
                               "Orange"="#F0810F", "Red"="red", "Sliver"="#C0C0C0", 
                               "Purple"="#800080", "Black"="black", 
                               "Grey"="grey60", "Beige"="#CFB997", "Blue"="#4CB5F5"))+
  labs(title = "Average Fine Amount vs. Vehicle Color", 
       x = "Vehicle Color", y = "Average Fine Amount")+
  theme_fivethirtyeight()+
  guides(fill = FALSE)+
  theme(plot.title = element_text(size = 11, hjust = 0))

Then, we deal with Average Fine Amount v.s Vehicle Year

mydata2 %>%
  filter(`Vehicle Year`> 0 & `Vehicle Year` <= 2020) %>%
  group_by(`Vehicle Year`) %>%
  summarise(average_fine = mean(fine)) %>%
  ungroup() %>%
  ggplot(aes(x = `Vehicle Year`, y = average_fine))+
  geom_line()+
  labs(title = "Average Fine Amount vs. Vehicle Year")+
  theme_fivethirtyeight()+
  theme(plot.title = element_text(size = 11, face = "bold", hjust = 0.5))

Finally, we look at Average Fine Amount vs. Vehicle Plate Type

mydata2 %>%
  filter(`Plate Type` %in% c("COM", "PAS")) %>%
  group_by(`Plate Type`) %>%
  summarise(average_fine = mean(fine)) %>%
  ungroup() %>%
  ggplot(aes(x = `Plate Type`, y = average_fine, fill = `Plate Type`))+
  geom_bar(stat = "identity")+
  guides(fill = FALSE)+
  scale_x_discrete(labels = c("Commerical", "Passenger"))+
  scale_y_continuous(limits = c(0, 100))+
  ggtitle("Average Fine Amount vs. Plate Type")+
  theme_fivethirtyeight()+
  theme(plot.title = element_text(size = 11, face = "bold", hjust = 0.5))

c) Effect of COVID

Let’s see if we can observe the effect of COVID restrictions on parking violations. Present a visualization that shows how parking violations changed after the New York statewide stay-at-home order on March 14, 2020. Make sure the visualization clearly highlights the main pattern (the COVID effect).

The Issue Date is not a in standard POSIXlt form, so we need to change it first.

mydata$`Issue Date` <- lubridate::mdy(mydata$`Issue Date`)
class(mydata$`Issue Date`)
## [1] "Date"
mydata %>%
  filter(`Issue Date` >= as.Date("2020-01-01") & `Issue Date` <= as.Date("2021-01-01")) %>%
  mutate(month = lubridate::month(`Issue Date`), 
         year = lubridate::year(`Issue Date`), 
         period = str_c(month, year, sep = "-")) %>%
  mutate(period = lubridate::my(period)) %>%
  group_by(period) %>%
  tally() %>%
  ungroup() %>%
  ggplot(aes(x = period, y = n, group = 1))+
  geom_line()+
  geom_segment(x = as.Date("2020-03-14"), 
               xend = as.Date("2020-03-14"),
               y = 0, 
               yend = 340000,
               color = "blue", linetype = "dashed")+
  scale_y_continuous(labels = scales::comma)+
  geom_text(label = "NYC Stay-at-Home Order", x = as.Date("2020-03-14"), y = 350000, 
            color = "red")+
  labs(title = "Number of Parking Violations in NYC, 2020", 
       caption = "Source: NYC Open Data")+
  theme_fivethirtyeight()+
  theme(plot.title = element_text(size = 11, hjust = 0))

# Remove some vectors
remove(beige, below_96, black, blue, brown, green, grey, orange, pink, purple, 
       sliver, white, yellow, red, selected_colors)

2. Map by Precincts

Read in the shape files for the police precincts and remove all precincts outside of Manhattan.

a) Number of tickets, total fines, and average fines

Provide three maps that show choropleth maps of:

  • the total number of tickets
  • the total amount of fines
  • the average amount of fines

Briefly describe what you learn from these maps in comparison.

register_google(key = keyring::key_get("MY_GOOGLE_API"))
geocode("Manhattan, New York City", source = "google")
## Source : https://maps.googleapis.com/maps/api/geocode/json?address=Manhattan,+New+York+City&key=xxx
## # A tibble: 1 x 2
##     lon   lat
##   <dbl> <dbl>
## 1 -74.0  40.8
# Download Map from Stamen Server
nyc_map <- get_map(c(lon = -73.97125, lat = 40.78306), 
                   zoom = 12, 
                   maptype = "toner", 
                   source = "stamen")
## Source : https://maps.googleapis.com/maps/api/staticmap?center=40.78306,-73.97125&zoom=12&size=640x640&scale=2&maptype=terrain&key=xxx
## Source : http://tile.stamen.com/toner/12/1205/1537.png
## Source : http://tile.stamen.com/toner/12/1206/1537.png
## Source : http://tile.stamen.com/toner/12/1207/1537.png
## Source : http://tile.stamen.com/toner/12/1205/1538.png
## Source : http://tile.stamen.com/toner/12/1206/1538.png
## Source : http://tile.stamen.com/toner/12/1207/1538.png
## Source : http://tile.stamen.com/toner/12/1205/1539.png
## Source : http://tile.stamen.com/toner/12/1206/1539.png
## Source : http://tile.stamen.com/toner/12/1207/1539.png
## Source : http://tile.stamen.com/toner/12/1205/1540.png
## Source : http://tile.stamen.com/toner/12/1206/1540.png
## Source : http://tile.stamen.com/toner/12/1207/1540.png
# Read Shapefile
nyc_police_precincts <- readOGR("data/police_precincts/nypp.shp", "nypp")
## OGR data source with driver: ESRI Shapefile 
## Source: "/Users/yifei/Documents/datacamp/data_viz_review/homework_samples/07_parking-graded/data/police_precincts/nypp.shp", layer: "nypp"
## with 77 features
## It has 3 fields
shapefile <- spTransform(nyc_police_precincts, CRS("+proj=longlat +datum=WGS84"))
shapefile_df <- fortify(shapefile, region = "Precinct")
shapefile_df$id <- as.numeric(shapefile_df$id)
manhattan <- shapefile_df[shapefile_df$id <=34, ]
summary3 <- mydata %>%
  filter(violation_precinct > 0 & violation_precinct <= 34) %>%
  group_by(violation_precinct) %>%
  summarise(n = n(), 
            total_revenue = sum(fine), 
            average_fine = mean(fine)) %>%
  ungroup()
# Combine data
manhattan <- manhattan %>%
  inner_join(summary3, by = c("id" = "violation_precinct"))
ggmap(nyc_map)+
  geom_polygon(data = manhattan, 
               aes(x = long, y = lat, group = group, fill = n/1000), color = "#FFFFFF")+
  scale_fill_distiller(palette = "YlGn", 
                       direction = "horizontal", 
                       guide = "colourbar", 
                       breaks = seq(0, 280, by = 40), 
                       limits = c(0, 280))+
  labs(title = "Total Number of Parking Violations in Manhattan, 2020")+
  guides(fill = guide_colorbar(title = "Total Number of Parking Violations (in thousands)", 
                               barwidth = 15, 
                               barheight = 1))+
  theme_map()+
  theme(legend.position = "bottom")

ggmap(nyc_map)+
  geom_polygon(data = manhattan, 
               aes(x = long, y = lat, group = group, fill = total_revenue/10000), 
               color = "#FFFFFF")+
  scale_fill_distiller(palette = "OrRd", 
                       direction = "horizontal", 
                       guide = "colourbar", 
                       breaks = seq(0, 2500, by = 500), 
                       limits = c(0, 2500))+
  labs(title = "Total Amount of Fines in Manhattan by Police Precinct, 2020")+
  guides(fill = guide_colorbar(title = "Total Amount of Fines (in 10 thousands)", 
                               barwidth = 15, 
                               barheight = 1))+
  theme_map()+
  theme(legend.position = "bottom")

ggmap(nyc_map)+
  geom_polygon(data = manhattan, 
               aes(x = long, y = lat, group = group, fill = average_fine), 
               color = "#FFFFFF")+
  scale_fill_distiller(palette = "Blues", 
                       direction = "horizontal", 
                       guide = "colourbar", 
                       breaks = seq(70, 100, by = 5), 
                       limits = c(70, 100))+
  labs(title = "Average Amount of Fines in Manhattan by Police Precinct, 2020")+
  guides(fill = guide_colorbar(title = "Average Amount of Fines", 
                               barwidth = 15, 
                               barheight = 1))+
  theme_map()+
  theme(legend.position = "bottom")

b) Types of violations

Group the almost 100 types of ticket violations into a smaller set of 4-6 subgroups (where other should be the remainder of violations not included in other groups you defined). [Hint: No need to spend more than 5 minutes thinking about what the right grouping is.]. Provide choropleth maps for each of these subgroups to show where different types of violations are more or less common.

manhanttan_code <- as.vector(sort(unique(manhattan$id)))
mydata3 <- mydata %>%
  filter(violation_precinct %in% manhanttan_code & !is.na(`VIOLATION DESCRIPTION`))
violation_descriptions <- unique(mydata$`VIOLATION DESCRIPTION`)
no_standing <- violation_descriptions[str_detect(violation_descriptions, 
                         pattern = START %R% "NO" %R% SPC %R% "STAND" %R% or("ING", " "))]
no_parking <-  violation_descriptions[str_detect(violation_descriptions, 
                                                 pattern = START %R% "NO" %R% SPC %R% "PARKING")]
fire_hydrant <- violation_descriptions[str_detect(violation_descriptions, 
                                                  pattern = START %R% "FIRE" %R% SPC %R% "HYDRANT")]
mydata3$`VIOLATION DESCRIPTION` <- ifelse(mydata3$`VIOLATION DESCRIPTION` %in% no_standing, 
                                          "No Standing", 
                                          mydata3$`VIOLATION DESCRIPTION`)
mydata3$`VIOLATION DESCRIPTION` <- ifelse(mydata3$`VIOLATION DESCRIPTION` %in% no_parking, 
                                          "No Parking", 
                                          mydata3$`VIOLATION DESCRIPTION`)
mydata3$`VIOLATION DESCRIPTION` <- ifelse(mydata3$`VIOLATION DESCRIPTION` %in% fire_hydrant, 
                                          "Fire Hydrant", 
                                          mydata3$`VIOLATION DESCRIPTION`)
mydata3$`VIOLATION DESCRIPTION` <- ifelse(mydata3$`VIOLATION DESCRIPTION` %in% c("No Standing", 
                                                                                 "No Parking", 
                                                                                 "Fire Hydrant"), 
                                          mydata3$`VIOLATION DESCRIPTION`, 
                                          "Others")
summary4 <- mydata3 %>%
  group_by(violation_precinct, `VIOLATION DESCRIPTION`) %>%
  tally() %>%
  ungroup() %>%
  rename(violation_number = n)
# Merge Data Set Again
manhattan2 <- shapefile_df[shapefile_df$id <=34, ]
manhattan2 <- manhattan2 %>%
  left_join(summary4, by = c("id" = "violation_precinct"))
ggmap(nyc_map, base_layer = ggplot(data = manhattan2, aes(x = long, y = lat)))+
  geom_polygon(aes(fill = log(violation_number), group = id), color = "white")+
  scale_fill_distiller(palette = "YlGn", 
                       direction = "horizontal")+
  ggtitle("Number of Violations grouped by Types in NYC, 2020")+
  labs(caption = "Source: NYC Open Data")+
  theme_map()+
  guides(fill = FALSE)+
  theme(plot.title = element_text(size = 11, face = "bold", hjust = 0.5))+
  facet_wrap(~`VIOLATION DESCRIPTION`)

3. Focus on the Upper East

Precinct 19 identifies the Upper East Side. The data currently does not provide latitude and longitude of the violation locations (and I am not sure what these street_code variables are for).

a) Ignoring fire hydrants

Restrict your data to parking violations related to fire hydrants (Violation Code = 40). Using the variables Street Name and House Number as well as the knowledge that these addresses are in the Upper East Side of Manhattan, geocode at least 500 addresses. Include a data table of these addresses and the latitude and longitude of these addresses in the output.

combined_data <- readRDS("data/combined_dataset.rds")

I’am not going to show Data Table here. You can simply call it use datatable() function.

b) Interactive Map

Provide an interactive map of the violations you geocoded using leaflet. Provide at least three pieces of information on the parking ticket in a popup.

combined_data$label <- paste("Address: ", combined_data$address, "<br>", 
                             "Plate ID: ", combined_data$`Plate ID`, "<br>", 
                             "Issue Date: ", combined_data$`Issue Date`)
leaflet(options = leafletOptions(dragging = FALSE,
                                 minZoom = 14,
                                 maxZoom = 18)) %>%
  setView(lng = mean(combined_data$lon), 
          lat = mean(combined_data$lat), 
          zoom = 14) %>%
  addProviderTiles("CartoDB") %>%
  addCircles(data = combined_data, 
                   label = ~lapply(label, HTML)) %>%
  addPolygons(data = subset(shapefile, Precinct == 19), 
              color = "#FF0000", 
              smoothFactor = 0.5, 
              fill = NA)
## Assuming "lon" and "lat" are longitude and latitude, respectively
c) Luxury cars and repeat offenders

Using the vehicle Plate ID, identify repeat offenders (in the full dataset). Create another variable called luxury_car in which you identify luxury car brands using the Vehicle Make variable.

Start with the previous map. Distinguish the points by whether the car is a repeat offender and/or luxury car. Add a legend informing the user about the color scheme. Also make sure that the added information about the car type and repeat offender status is now contained in the popup information. Show this map.

sort(unique(combined_data$`Vehicle Make`))
##  [1] "ACURA" "AUDI"  "BMW"   "BUICK" "CADIL" "CHEVR" "CHRYS" "CITRO" "DODGE"
## [10] "FIAT"  "FORD"  "FRUEH" "GMC"   "HARLE" "HIN"   "HONDA" "HUMME" "HYUND"
## [19] "INFIN" "INTER" "ISUZU" "JAGUA" "JEEP"  "JENSE" "KENWO" "KIA"   "LAMBO"
## [28] "LEXUS" "LINCO" "MACK"  "MAZDA" "ME/BE" "MINI"  "MITSU" "NISSA" "NIU"  
## [37] "NS/OT" "PONTI" "PORSC" "ROLLS" "ROVER" "SAAB"  "SPRI"  "SUBAR" "SUZUK"
## [46] "TESLA" "TOYOT" "UD"    "VOLKS" "VOLVO"
luxury_car_brands <- c("ACURA", "AUDI", "BMW", "CADIL", "CHRYS", "HONDA", "PORSC", 
                       "TESLA", "LAMBO", "LINCO", "LEXUS", "ME/BE", "ROVER", "GMC", 
                       "JAGUA", "ROLLS", "HARLE")
mydata4 <- combined_data %>%
  group_by(`Plate ID`) %>%
  mutate(repeat_times = n()) %>%
  ungroup() %>%
  mutate(luxury_car = ifelse(combined_data$`Vehicle Make` %in% luxury_car_brands, 
                                   "Luxury Car", 
                                   "Normal Car")) %>%
  arrange(desc(repeat_times))
pal <- colorFactor(palette = c("#FB6542", "#375E97"), 
                   levels = c("Luxury Car", "Normal Car"))
mydata4$label <- paste("Address: ", mydata4$address, "<br>", 
                       "Plate ID: ", mydata4$`Plate ID`, "<br>",
                       "Issue Date: ", mydata4$`Issue Date`, "<br>",
                       "Car Brand: ", mydata4$`Vehicle Make`, "<br>", 
                       "Repeat Times: ", mydata4$repeat_times)
leaflet(options = leafletOptions(minZoom = 14,
                                 maxZoom = 18)) %>%
  setView(lng = mean(mydata4$lon), 
          lat = mean(mydata4$lat), 
          zoom = 14) %>%
  addProviderTiles("CartoDB") %>%
  addCircleMarkers(data = filter(mydata4, repeat_times <= 1), 
             color = ~pal(luxury_car), 
             label = ~lapply(label, HTML),
             labelOptions = list(textsize = "14px"),
             radius = 1,
             group = "Non-Repeat Offender") %>%
  addCircleMarkers(data = filter(mydata4, repeat_times > 1), 
             color = ~pal(luxury_car), 
             label = ~lapply(label, HTML),
             labelOptions = list(textsize = "14px"),
             radius = 1,
             group = "Repeat Offender") %>%
  addPolygons(data = subset(shapefile, Precinct == 19), 
              color = "#FF0000", 
              smoothFactor = 0.5, 
              fill = NA) %>%
  addLegend(position = "bottomright", 
            pal = pal, 
            values = c("Luxury Car", "Normal Car"), 
            title = "Luxury Car Status", 
            opacity = 0.5) %>%
  addLayersControl(overlayGroups = c("Non-Repeat Offender", "Repeat Offender"), 
                   position = "topright", 
                   options = layersControlOptions(collapsed = FALSE))
## Assuming "lon" and "lat" are longitude and latitude, respectively
## Assuming "lon" and "lat" are longitude and latitude, respectively
d) Cluster

Add marker clustering, so that zooming in will reveal the individual locations but the zoomed out map only shows the clusters. Show the map with clusters.

leaflet(options = leafletOptions(minZoom = 14,
                                 maxZoom = 18)) %>%
  setView(lng = mean(mydata4$lon), 
          lat = mean(mydata4$lat), 
          zoom = 14) %>%
  addProviderTiles("CartoDB") %>%
  addCircleMarkers(data = filter(mydata4, repeat_times <= 1), 
             color = ~pal(luxury_car), 
             label = ~lapply(label, HTML), 
             radius = 1,
             group = "Non-Repeat Offender", 
             clusterOptions = markerClusterOptions()) %>%
  addCircleMarkers(data = filter(mydata4, repeat_times > 1), 
             color = ~pal(luxury_car),
             label = ~lapply(label, HTML),
             radius = 1,
             group = "Repeat Offender", 
             clusterOptions = markerClusterOptions()) %>%
  addPolygons(data = subset(shapefile, Precinct == 19), 
              color = "#FF0000", 
              smoothFactor = 0.5, 
              fill = NA) %>%
  addLegend(position = "bottomright", 
            pal = pal, 
            values = c("Luxury Car", "Normal Car"), 
            title = "Luxury Car Status", 
            opacity = 0.5) %>%
  addLayersControl(overlayGroups = c("Non-Repeat Offender", "Repeat Offender"), 
                   position = "topright", 
                   options = layersControlOptions(collapsed = FALSE))
## Assuming "lon" and "lat" are longitude and latitude, respectively
## Assuming "lon" and "lat" are longitude and latitude, respectively

Submission

Please follow the instructions to submit your homework. The homework is due on Wednesday, March 17.

Please stay honest!

If you do come across something online that provides part of the analysis / code etc., please no wholesale copying of other ideas. We are trying to evaluate your abilities to visualized data not the ability to do internet searches. Also, this is an individually assigned exercise – please keep your solution to yourself.